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Abstract — The main objective of this paper is to classify the face images using HSV color features and an 
image retrieval system (CBIR) is presented which can retrieve facial images from the extracted facial features. 
The primary principle of CBIR in retrieving the face images is to retrieve almost all relevant images as well as 
to minimize the number of irrelevant images. This can be achieved with the help of clustering algorithm. When 
a query image is searched, the first step is determining the nearest cluster and the second step involves the 
computation of the distances between the query image and the target images assigned to the corresponding 
cluster. Finally, images that are similar to the query image are retrieved and displayed. The experiment result is 
compared with Euclidean distance metric where the clustering technique produces accurate image retrieval and 
better classification of images. 
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I. Introduction 

THE common Image Retrieval system for retrieving 
images from large database of digital images utilizes 
metadata / keywords. Manual image annotation is time 
consuming and locating desired image from small database is 
possible, where as in large database more effective 
techniques are needed. Content -based image retrieval systems 
(CBIR) are very useful and efficient if the images are 
classified on the score of particular aspects. CBIR is a 
technique which uses visual contents, normally known as 
features, to search images from large scale image databases 
according to users' requests in the form of a query image. 
Given a query image, try to find visually similar images from 
an image database. If the distance between two vectors is 
smaller than the threshold, we get one match. Visual contents 
are colors, shapes, textures, objects, or meta-data (e.g., tags) 
derived from images. CBIR operates on a totally different 
principle, retrieving/searching stored images from a 
collection by comparing features automatically extracted 
from the images themselves. The commonest features used 
are mathematical measures of color, texture or shape (basic). 
A more review reports a tremendous growth in CBIR 
techniques. Applications of CBIR systems to medical 
domains already exist, although most of the systems currently 
available are based on radiological images. Most of the work 
in dermatology has focused on skin cancer detection. 
Different techniques for segmentation, feature extraction and 



classification have been reported by several authors. Here the 
face images are retrieved using CBIR techniques. The face 
image retrieval with CBIR provides fast retrieval efficiency. 

Facial image retrieval retrieves images based on 
information extracted from human faces. It is a specific 
problem of content based image retrieval and has great 
potential in various applications, such as Human -Computer 
Interaction (HCI), digital video processing and visual 
surveillance. Due to the rising popularity of digital cameras 
and digital albums, retrieving images of human faces 
becomes an interesting problem. In the literature, Gudivada & 
Raghavan (1997) proposed a framework for image retrieval 
systems and implemented a feature based face retrieval 
system; Satoh et al., (1999) built a face retrieval and 
recognition system called "Name It" for video content 
analysis based on an eigen-face method for face recognition. 
Eickeler (2002) used a pseudo 2D Hidden Markov Model to 
retrieve faces from a face database. Most of the current face 
retrieval systems are based on the face recognition technique, 
i.e. retrieval by identity. Besides identity, the face contains a 
lot of other useful category information, such as gender, age, 
ethnicity and even expression, etc. It would be helpful to use 
clues other than identity to retrieve facial images. 

II. Literature Review 

Early systems existed already in the beginning of the 1980s 
[Chang & Fu, 1980], the majority would recall systems such 
as IBM's QBIC (Query by Image Content) as the start of 
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content based image retrieval [Niblack et al., 1993; Ritendra 
Datta et al., 2008]. QBIC supports users to retrieve image by 
colour, shape and texture. QBIC provides several query 
methods Simple Query Multi -Feature Query Multi-Pass 
Query. Few of the techniques have used global color and 
texture features [Niblack et al., 1993; Pentland, 1994; Markus 
Strieker & Markus Orengo, 1995] where as few others have 
used local color and texture features [Natsev et al., 1999; Jia 
Li et al., 2000; Carson et al., 2002; Chen & Wang, 2002]. The 
latter approach segments the image into regions based on 
color and texture features. The regions are close to human 
perception and are used as the basic building blocks for 
feature computation and similarity measurement. These 
systems are called region based image retrieval (RBIR) 
systems and have proven to be more efficient in terms of 
retrieval performance. Traditional text-based image search 
engines perform manual annotation of images and use text- 
based retrieval methods. In text -based retrieval methods, the 
following limitations occur in image annotation: large 
volumes of databases and valid only for one language. With 
image retrieval, this limitation should not exist. In human 
perception of text -based retrieval methods, there exist certain 
limitations such as subjectivity of human perception and too 
much responsibility on the end-user. In abstract, the queries 
cannot be described at all, but tap into the visual features of 
images. The advantage of CBIR over TBIR (Text-Based 
Image Retrieval) are among other image retrieval methods, 
CBIR is an approach that exclusively relies on the visual 
features, such as color histogram, texture, shape, and so forth, 
of the images. One of the obvious advantages of CBIR over 
other methods, e.g., text-based image retrieval, is that CBIR 
can be done in a fully automatic process since the visual 
features are automatically extracted. 

The image classification is treated as a preprocessing 
step for speeding-up image retrieval in large databases and 
improving accuracy, or for performing automatic image 
annotation. Image clustering inherently depends on a 
similarity measure. Image classification is often followed by 
a step of similarity measurement, restricted to those images in 
a large database that belong to the same visual class as 
predicted for the query. In such cases, the retrieval process is 
twisted, whereby classification and similarity matching steps 
together form the retrieval process. Similar arguments hold 
for clustering as well, due to which, in many cases, it is also a 
fundamental "early" step in image retrieval [Ritendra Datta et 
al., 2008]. The K-Means clustering procedure is applied for 
scalable image retrieval from large databases. K-Means is an 
iterative improvement heuristic algorithm which works faster. 
A common method is to run the algorithm several times 
recover the best clustering found. 

Color is one of the most widely used features for image 
similarity retrieval, Color retrieval yields the best results, in 
that the computer results of color similarity are similar to 
those derived by a human visual system that is capable of 
differentiating between infinitely large numbers of colors. 
One of the main aspects of color feature extraction is the 
choice of a color space. A color space is a multidimensional 



space in which the different dimensions represent the 
different components of color [Daniela Stan & Ishwar K. 
Sethi, 2001]. Most color spaces are three dimensional. 
Example of a color space is RGB, which assigns to each pixel 
a three element vector giving the color intensities of the three 
primary colors, red, green and blue. The space spanned by the 
R, G, and B values completely describes visible colors, which 
are represented as vectors in the 3D RGB color space. As a 
result, the RGB color space provides a useful starting point 
for representing color features of images. For color feature 
extraction the HSV color space is used that is quite similar to 
the way in which the colors are defined as human perception, 
which is not always possible in the case of RGB color space. 

III. Color Space 

A color space is defined as a model for representing color in 
terms of intensity values with one- to four-dimensional space. 
A color component, or a color channel, is one of the 
dimensions. In this proposed work, HSV color space is used. 

3.1. HSV Color Space 

HSV stands for hue, saturation, and value. The value 
represents intensity of a color, which is decoupled from the 
color information in the represented image. The hue and 
saturation components are intimately related to the way 
human eye perceives color resulting in image processing 
algorithms with physiological basis. As hue varies from to 
1 .0, the corresponding colors vary from red, through yellow, 
green, cyan, blue, and magenta, back to red, so that there are 
actually red values both at and 1.0. As saturation varies 
from to 1.0, the corresponding colors (hues) vary from 
unsaturated (shades of gray) to fully saturated (no white 
component). As value, or brightness, varies from to 1.0, the 
corresponding colors become increasingly brighter. The HSV 
coordinate system model and the HSV color are shown in 
figure 1 and figure 2. 




HSV Coordinate System Model 
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HSV Color Feature Extraction 



Clustering of Image 



Image Data Base 



3.2. Color Conversion 

In order to use a good color space, color conversion is needed 
between color spaces which preserve the perceived color 
differences. 

3.2.1. RGB to HSV Conversion 

Initially, the R, G, B values are divided by 255 to change the 

range from 0...255 to 0...1: 

R' = R/255 

G' = G/255 

B' = B/255 

Cmax = max(R', G', B') 

Cmin = min(R', G, B') 

A = Cmax - Cmin 

A. Hue Calculation 

The hue value can be calculated using the following formula: 
/G'-B' \ 

60° x — - — mod6 ,C max =R 



/R -G 



l-2j,C m; 
E 4 J,C m£ 



B. Saturation Calculation 

The saturation value can be calculated using the following 

formula: 



C. Value calculation 

The value is calculated using the following formula. 



IV. HSV Color Space with Clustering 
Techniques 

The proposed system consists of three modules: Feature 
extraction, clustering of images and finding the similar 

image. 



Figure 3 - Block Diagram of Proposed System 

4.1. Feature Extraction 

A RGB color image set is converted into a HSV color image 
using RGB to HSV conversion technique discussed in section 
3.2.1. 

4.2. Clustering of Images using K-Means Clustering 

The images in the database are clustered using clustering 
technique, the representative bin number of each cluster is 
found and the query image is compared with only the cluster 
representatives. Given the cluster number K, the K-Means 
algorithm is carried out in three steps: 

1. Initialisation: set seed points. Assign each object to 
the cluster with the nearest seed point. 

2. Compute seed points as the centroids of the clusters 
of the current partition (the centroid is the centre, 
i.e., mean point, of the cluster) 

3. Go back to Step 1, stop when no more new 
assignment. 




Figure 4 - Clustering Algorithm 
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V. Result Discussion 

Given a query image, the cluster whose features are closer to 
the query image feature vector is retrieved. The efficiency of 
the proposed system is measured using the quality of 
clustering and the correct retrieval of face images. When the 
quality of clustering is compared with the similarity measure, 
Euclidean distance, the clustering proved to be efficient in 
retrieving images. With Euclidean distance metrics the 
similar images and also, the irrelevant images that are not a 
part of the query image are retrieved which leads to 
confusion. But the clustering technique minimizes the 
irrelevant images and classifies images with distance value 
obtained from the clusters. 



VI. Conclusion 

In this paper, an approach for Content Based Image Retrieval 
using HSV Color features is proposed. K-Means clustering 
technique is applied to the images are initially clustered into a 
group which has similar HSV color content. Then the chosen 
group is clustered using K-Means clustering algorithm. K- 
Means is a clustering method based on the optimization of an 
overall measure of clustering quality is known for its 
efficiency in producing accurate results in image retrieval. 
Since each cluster obtained is a unique set of similar images, 
the user can select an image set of his choice and further 
refine the search by applying K-Means technique. The 
images are retrieved efficiently and classified according to 
the cluster distance value. 
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